Anytime Planning for Decentralized POMDPs using Expectation Maximization
نویسندگان
چکیده
Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While finite-horizon DECPOMDPs have enjoyed significant success, progress remains slow for the infinite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infinite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers.
منابع مشابه
Message-passing algorithms for large structured decentralized POMDPs
Decentralized POMDPs provide a rigorous framework for multi-agent decision-theoretic planning. However, their high complexity has limited scalability. In this work, we present a promising new class of algorithms based on probabilistic inference for infinite-horizon ND-POMDPs—a restricted Dec-POMDP model. We first transform the policy optimization problem to that of likelihood maximization in a ...
متن کاملExpectation Maximization for Average Reward Decentralized POMDPs
Planning for multiple agents under uncertainty is often based on decentralized partially observable Markov decision processes (DecPOMDPs), but current methods must de-emphasize long-term effects of actions by a discount factor. In tasks like wireless networking, agents are evaluated by average performance over time, both short and longterm effects of actions are crucial, and discounting based s...
متن کاملImproved Planning for Infinite-Horizon Interactive POMDPs using Probabilistic Inference (Extended Abstract)
We provide the first formalization of self-interested multiagent planning using expectation-maximization (EM). Our formalization in the context of infinite-horizon and finitely-nested interactivePOMDP (I-POMDP) is distinct from EM formulations for POMDPs and other multiagent planning frameworks. Specific to I-POMDPs, we exploit the graphical model structure and present a new approach based on b...
متن کاملEfficient Planning for Factored Infinite-Horizon DEC-POMDPs
Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is r...
متن کاملMonte-Carlo Expectation Maximization for Decentralized POMDPs
We address two significant drawbacks of state-ofthe-art solvers of decentralized POMDPs (DECPOMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version ...
متن کامل